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Abstract 



This paper investigates cooperative estimation of 3D target object motion for visual sensor networks. 
pg I In particular, we consider the situation where multiple smart vision cameras see a group of target 

objects. The objective here is to meet two requirements simultaneously: averaging for static objects and 
tracking to moving target objects. For this purpose, we present a cooperative estimation mechanism called 
networked visual motion observer. We then derive an upper bound of the ultimate error between the actual 
average and the estimates produced by the present networked estimation mechanism. Moreover, we also 
analyze the tracking performance of the estimates to moving target objects. Finally the effectiveness of 
the networked visual motion observer is demonstrated through simulation. 



\f~\ • Index Terms 

^^ , Cooperative estimation. Visual-based observer. Averaging, Passivity, Visual sensor network 

-^ , I. Introduction 

X. 

H , A visual sensor network [[II, is a kind of wireless sensor network consisting of spatially 

distributed smart cameras with communication and computation capability. Unlike other sensors 
measuring values such as temperature and pressure, vision sensors do not provide explicit data but 
combining image processing techniques or human operators gives rich information on situation 
awareness such as what happens, what a target is, where it is and where it bears. Due to 
their nature, visual sensor networks are useful in environmental monitoring, surveillance, target 
tracking and entertainment and are expected as a component of sustainable infrastructures. 
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A lot of research works have been devoted to fusing control techniques with visual information 
so-called visual feedback control or images in the loop [|3]|-[l9]|. The motivating scenarios of the 
fusion currently spread over the robotic systems into security and surveillance systems, medical 
imaging procedures, human-in-the-loop systems and even understanding biological perceptual 
information processing. Driven by the technological innovations of the smart wearable cameras, 
the aforementioned networked vision system also emerges as a challenging new application field 
of the visual feedback control and estimation. 

In this paper, we focus on estimation of 3D rigid body motion as in [|3-[l9l|, and reconsider 
the problem not for a single camera system but for the networked vision systems. In particular, 
we aim at an extension of fSl from the single camera to visual sensor networks, where the paper 
(HI presents a vision-based observer called visual motion observer flUl estimating 3D target 
object motion from 2D vision data. In visual sensor networks, it is expected that not only an 
estimate is produced but also the vision cameras cooperate with each other in an efficient manner, 
which brings us new theoretical challenges. The advantages of cooperation are: (i) accurate 
estimation by integrating rich information, (ii) tolerance against obstruction, misdetection in 
image processing and sensor failures and (iii) wide vision and elimination of blind areas by 
fusing images of a scene from a variety of viewpoints. To tackle such distributed estimation 
problems, cooperative control as in [[T0ll - [[T5II provides useful methodologies. In this paper, we 
especially focus on passivity -based cooperative control schemes investigated in [|T2 l |- [rr5l . 

Cooperative estimation for sensor networks has been addressed in [fT6l - [|24l . The main objec- 
tive of these researches is averaging the local measurements or local estimates among sensors 
in a distributed fashion in order to improve estimation accuracy. For this purpose, most of 
the works utilize the consensus protocol [fTOll in the update of the local estimates. While [16], 
[fTTl assume that parameters to be estimated are fixed, ffT8l - ll24l address estimation of dynamic 
parameters assuming that the parameters follow some dynamical system. Among them, lfT8l -[|22 | 
execute a large number of consensus iterations between each update of estimates, which is hardly 
applicable to dynamic estimation problems except for the case of slow dynamics. Meanwhile, 
[|23l and [|24| present estimation algorithms without using such iterations. Unfortunately, however, 
most of these algorithms are not applicable to our problem since the object's pose takes values 
in a non-Euclidean space and the consensus scheme on a vector space [[TOl does not work there. 

Meanwhile, average computation in the group of rotations is tackled by [fTTl . Il25l . [|26l . 
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The paper [|25l defines two types average rotations, Euclidean and Riemannian means, and 
derives their fundamental properties. Reference [J26l| presents a computational algorithm of the 
Riemannian mean and analyzes its convergence. The paper IfTTl presents a distributed version 
of the algorithm in [|26ll based on the consensus protocol [[TOll , which is motivated by the visual 
sensor networks. However, ifTTl focuses on averaging by assuming that the target orientations are 
obtained a priori and the scheme cannot be essentially extended to dynamic estimation problems. 

In this paper, we present a novel cooperative estimation mechanism called networked visual 
motion observer. We consider the situation where multiple smart vision cameras capture a group 
of target objects. Under the situation, the objective of the present estimation mechanism is to meet 
two requirements simultaneously: averaging for static objects, which means gaining estimates 
close to an average of multiple target objects' poses, and tracking to moving target objects, 
which means that the estimates track the moving average within a bounded error. Namely, the 
present mechanism deals with both static and dynamic estimation problems. For this purpose, 
we first present the networked visual motion observer, which consists of the visual feedback and 
mutual feedback from neighboring vision cameras, based on the passivity-based visual motion 
observer [8] and the passivity-based pose synchronization law presented in fV5\. 

We next evaluate the averaging performance attained by the networked visual motion observer. 
For this purpose, we define a notion of approximate averaging by using the ultimate error between 
the actual average and the estimates produced by the present observer. Then, we derive an upper 
bound of the ultimate error, whose partial solution is already given in [|27ll . [|28l and this paper 
provides its generalized version. The result gives us an insight into the gain selection such that 
average estimation becomes accurate if mutual feedback is much stronger than visual feedback. 

We moreover evaluate the tracking performance of the estimates to moving target objects. 
Here, we view the body velocities of the target objects as a disturbance of the total networked 
system and evaluate the ultimate distance from the estimates to the average. We see from the 
result an insight that choosing a large visual feedback gain results in a good tracking performance. 

Finally, we demonstrate the effectiveness of the present networked visual motion observer and 
validity of the theoretical results through simulation. 

The organization of this paper is as follows. Section |Il] explains the situation under consider- 
ation in this paper and formulates the visual sensor networks together with the objective to be 
met. In Section Unl after introducing the visual motion observer [|8]|, we present the networked 
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visual motion observer. Section |IV] clarifies accuracy of the average estimation when the present 
estimation mechanism is applied to the network of vision cameras. Section |V] clarifies the tracking 
performance of the estimates when the target objects are moving. Verifications through simulation 
are shown in Section |VIl Finally, Section IVIH draws conclusions. 

We finally give some notations used in this paper, where the readers are recommended to refer 
to (31 for details on the terminologies. Throughout this paper, we use the notation e^"*^"* eTZ^^^ 
to represent the rotation matrix of a frame T.b relative to a frame S^, which is orthogonal 
with unit determinant and hence an element of the Lie group 5*0(3) := {R E TZ^^^\ R^R = 
I3 and det(_R) = +1}. The vector ^ab £ T^^ specifies the rotation axis and 9ab E TZis the rotation 
angle. For simplicity we use ^9ab to denote ^atdab- The configuration space of the rigid body 

motion is the product space SE{3) := TZ x 5*0(3). We use the 4 x 4 matrix gab = 

1 

as the homogeneous representation of gab = {pab, e^^"'') G SE{3) describing the configuration 
of T^b relative to S^. The notation 'A' is the operator such that ab = a x b for the vector cross- 
product X, i.e. d is a 3 X 3 skew-symmetric matrix. The vector space of all 3 x 3 skew- symmetric 
matrices is denoted by so(3). The notation 'V' denotes the inverse operator to 'A'. Similarly 

to the definition of so(3), we define se(3) := {{v,uj) : v E 71^, u E so{3)}. In homogeneous 

u V 
representation, we write an element V := (f,w) as V = 



II. Preparation for Visual Sensor Networks 

Let us consider the situation where n vision cameras V := {1, ■ ■ ■ ,n} with communication 
and computation capability see a group of target objects {ojjigy (Fig- [D^ where each vision 
camera i eV captures object Oj on its image plane. Throughout this paper, we use the pinhole- 
type vision cameras with perspective projection [3] as in Fig. [21 Note however that all of the 
subsequent discussions are applicable to panoramic cameras through the modifications in [|29l . 

In this paper, we address estimation of average motion of the objects {ojjjgv The problem 
includes a scenario such that all the cameras see a common single target object but the pose 
consistent with vision data differs from camera to camera due to incomplete localization and 
parametric uncertainties. Under such a situation, averaging the contaminated poses is a way to 
improve estimation accuracy [|20ll . 
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Fig. 2. Vision Camera Model 



Fig. 1. Visual Sensor Networks 



A. Rigid Body Motion 

Let the coordinate frames E^, S^ and Sq^ represent the world frame, the i-th vision camera 
frame, and the frame of object o,;, respectively. The pose of vision camera Sj and object So^ rela- 
tive to the world frame S^ are denoted by g^^i = {pwi-, e^^"") G SE{?,) and g^o, = (Pwoi, e^^™°«) G 
SE{3). Then, the pose of Sq- relative to S,;, denoted by gio^ = {pioi,e^^'°') G SE{3), can be 
represented as gio^ = g'lgwo,- 

We next define the body velocity of object o, relative to the world frame S^ as V^^, = 
{vwoi,cUwoi) £ T^^, where v^o^ and cj^q. respectively represent the linear and angular velocities 
of the origin of Sq. relative to T,^ [O. Similarly, vision camera i's body velocity relative to S^ 
will be denoted as V^^ = {vun^uj^i) G TZ^. 

By using the body velocities V^^ and V^^., the motion of the relative pose gio^ is written as 

9io, = - yLgro, + 9io, yL ( 1 ) 



jSl. Equation ^ is called relative rigid body motion whose block diagram is depicted in Fig. [3l 



B. Visual Measurement 

In this subsection, we define visual measurements of each vision camera which is available 
for estimation. We assume that each target object has m feature points and each vision camera 
can extract them from the vision data by using some techniques like [ [30l . The position vectors 
of the target object i's /-th feature point relative to Sq^ and Sj are denoted by po^i G IZ^ and 
Pa G TZ^ respectively. Using a transformation of the coordinates, we have pu = gioiPoii, where 
Po^i and Pu should be regarded with a slight abuse of notation as [pj^ 1]^ and [pf^ 1]^. 
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Fig. 3. Block Diagram of Relative Rigid Body Motion 
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Fig. 4. Block Diagram of the RRBM witli Vision Camera (RRBM 
is an acronym for Relative Rigid Body Motion) 



Let the m feature points of object Oj on the image plane coordinate be the measurement /j 
of camera i, which is given by the perspective projection Q with a focal length Aj as 



A,; 



fi ■= [fii ■■■ fim] e "^ "> fii = —[^ii yu]^ Pii = [^ii Vii ZiiY 



(2) 



Under the assumption that each camera i knows the location of feature points po^i E 'R?, the 
visual measurement fi depends only on the relative pose gio^ from ^ and pu = gio^Poii- Fig- IH 
shows the block diagram of the relative rigid body motion with the camera model. 



C. Communication 

The vision cameras have communication capability with the neighboring cameras and form a 
network. The communication is modeled by a digraph G = {V,£), where £^ C V x V as in the 
left figure of Fig. |5l Namely, vision camera i can get some information from j if (j, i) E £. In 
addition, we define the neighbor set A/^ of vision camera i G V as 



^f^■.= {JEV\ ij,t)E£}. 



(3) 



Let us now employ the following assumption on the graph G. 

Assumption 1: The communication graph G is fixed, balanced and strongly connected. 
The balanced and strongly connected graph is a graph such that there exists at least one directed 
path between any pair of nodes and the in-degree and out-degree are equal for all nodes [fTT|. 

We also denote by Gu the undirected graph produced by replacing all the directed edges of 
G by the undirected ones. Let T(io) be the set of all spanning trees over Gu with a root io eV 
and we consider an element Gt = (V, £t) E T{io)- Let the path from zq to a node i eV along 



with the tree Gt be denoted by Pgt{^) 



{Vq,--- ,Vda^(i)), Vo = lo, Vda^ii) 



{vi,Vi+i} E 
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Fig. 5. Left: Communication Graph, Middle: Tree witii Root 1 Minimizing D (D — 8), Riglit: Tree Minimizing D (W — 3) 

£^T V/ G {0, ■ ■ ■ , dGj,{i) — 1}, where dGj.{i) denotes the length of the path PGj.{i). We also define 

{1, if the path Pcri''-) includes edge E 
0, otherwise 
for any E E St- By using the above notations, we define 

W ■=mmD{io), D{iQ):= min Z)(Gr), ^^(^7) := max V^c^rl^; O^^GtIO- (4) 

ioGV GT&T(io) EgSt ^-^ 

For example, let us consider the communication graph in Fig.[2Left). Suppose that we choose 
zo = 1 and build a tree depicted in the middle figure of Fig. [5l where the number at around 
each edge is the value of ^■^y5GTiE!;i)dGj,{i). Namely, D is equal to 8 for the tree and it is 
actually minimal for all spanning trees in T(l). However, choosing another node as a root can 
reduce the value of D. Indeed, as illustrated in the right figure of Fig. [51 a tree with io = 3 
achieves D = 3, which is the minimal -D(io) among all the choices of the root io. 

D. Average on 5*0(3) and SE{3) 

In this paper, the tuple of the relative rigid body motion ([T]), the visual measurement © and 
the communication structure ^ is called a visual sensor network. The objective of this paper 
is to present a cooperative estimation mechanism for the visual sensor networks meeting the 
following requirements simultaneously: Averaging for static objects, which means each camera 
i estimates a pose close to an average of {gioj}jev, dioj '■= g^idwoj. Tracking to moving objects, 
which means the estimates track the moving average pose within a bounded tracking error. 

Let us now introduce the following mean g* on SE(3) as an average of target poses {gwo }jev- 

g* = {p\ e^^*) := arg min V ipig'^gwo,), (5) 

geSE{3) ^ 
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where the function ip is defined for any g = {p, e^ ) G SE{3) as 

Hg) ■■= IWh- g\\l = ^Ibf + 0(e«'^), <P{e^') ■■= IWh- c^Tf = tr(/3 - e^') (6) 
and ||M||i? is the matrix Frobenius norm of matrix M. Hereafter, we also use the notation 

9* = {P*i, e^'^0 := arg iniji V ip{gr^gio^) = g-jg*. 

9.&SE{3) j-^ 

The position average p* is equal to the arithmetic mean p* = ^J2jevP^"Oj '^^ target posi- 
tions {pwo}jev and the orientation average e^^' is a so-called Euclidean mean ll25l of target 
orientations {6^^""°^ }jgv defined by 



e««GSO{3) ^ 

It is known [25] that the Euclidean mean e^^* is given by 



e'-" := arg mm y^0(e~^^e^^"'°j). (7) 



e«^* (t) = Proj (5(t)) , S{t) := -J^ e^'"""^ (^)- (8) 



n 
iev 



Here, Proj(M) is the orthogonal projection of M G 7^^^^ onto 50(3), which is given by UuVf 



M 
T 



for the matrix M with singular value decomposition M = Um^Vm IESJ. 

Remark 1: Just computing the Euclidean mean is not so difficult even in a distributed fashion 
if we have prior knowledge that the target object is static. Indeed, the matrix S is computed by 
using the consensus protocol under appropriate assumptions on the graph [[TOl and the operation 
Proj can be locally executed. However, such a scheme works only for static objects and never 
embodies tracking nature for moving target objects. The objective here is to present an estimation 
mechanism without using any prior knowledge and any decision-making process on whether the 
targets are static or moving. 

III. Networked Visual Motion Observer 
A. Visual Motion Observer 

In this subsection, we consider the problem that vision camera i estimates the target object mo- 
tion (yfjoi from the visual measurements fi without considering communication. For the purpose, 
we introduce the visual motion observer presented in |[8l. 

We first prepare a model of the rigid body motion ([T]) similarly to the Luenberger observer as 

9io, = -V^i9m + 9mUei, (9) 
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Fig. 6. Estimation Error System 



Fig. 7. Visual Motion Observer 



where ^jq. = (pjo . , e^^*"^ ) is the estimate of the actual relative pose gio-. The input u^i = 
{vuei, ^uei) is to bc determined to drive the estimated value gio,. to the actual gio^. 

In order to establish the estimation error system, we define the estimation error between the 
estimated value gio^ and the actual relative rigid body motion gio^ as g^i = {pei, e^^"] 



Using the notations eR(e^^) := sk(e^^)^ and sk(e^ 
of the estimation error gei is given by 

Cei := ER{gei), ER{gei) := 



9ioi 9iOi ■ 

|(e^^ — e~^^), the vector representation 



Pe 



^lie^'"] 



(10) 



Once the estimate (jio^ is determined, the estimated measurement fi is also computed by ©■ Let 
us now define the visual measurement error as fei := fiigioj — fiisiioi)- Then, the measurement 
error vector fei can be approximately given by fei = Mgiojeei W\, where Ji{gioJ : 5^(3) -^ 
-j^2mx6 jg j-j^g well-known image Jacobian. Now, if m > 4, the image Jacobian has the full 
column rank and the estimation error vector e^, is reconstructed as 



JK9m)fe 



(11) 



where f denotes the pseudo-inverse. 

Differentiating gei = g^oldm with respect to time and using Q) and Q, we obtain the 
estimation error system 



9e 



'"^eiQei + Qei'woi' 



(12) 



Fig. [6] shows the block diagram of the system (fT2l) . The paper |[8l proves that if V^^. = 0, then 
the estimation error system (fT2l) is passive from the input Uei to the output — Cej. 
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Based on passivity-based control theory, we close the loop by using the input 

Uei = rCe\ Ceij ^ l^e^^ei-! rCg > U. (ij) 

Then, the resulting total estimation mechanism formulated as 



/■ 



9io^ = -Vwi9m + 9ioAi • • • © 

ee^ = JK9^oJfe^ •••(HI]) (14) 



Visual Motion Observer: < 

IJ/ei ^ KeGei ' ' ' (LL2|) 

is called visual motion observer [9], whose block diagram is illustrated in Fig. |71 In terms of 
the mechanism, we immediately obtain the following facts from passivity. 

Fact 1: im (i) If V^^. = 0, then the equilibrium point Cgj = for the closed-loop system (fT2l) 
with (fT3l) is asymptotically stable, (ii) Given a positive scalar z/j, if k^ satisfies k^ — ^ — ^ > 0, 

i 

then the system (fT2)) and (fT3l) with input V^^. and output Cei has L2-gain smaller than z/^. 
Item (i) means the visual motion observer leads the estimate gio^ to the actual gio^ for a static 
object. Item (ii) implies that the observer also works for a moving target object, and the parameter 
Ui is an index on estimation accuracy when the observer is applied to a moving target. 

B. Networked Visual Motion Observer 

The objective of this paper is to achieve averaging, while preserving the tracking nature of 
the visual motion observer. For this purpose, this subsection presents a cooperative estimation 
mechanism under the assumption of (i) each vision camera knows relative pose gij = g^lgwj 
with respect to neighbors j E Mi and (ii) all the vision cameras are static, i.e. V^- = Vi G V. 

Under V^^ = 0, the relative rigid body motion Q) is simply given by gio- = gioiV^a- 
Accordingly, the update procedure in (fT4)) is reformulated as 

Then, the following proposition holds in terms of the procedure (fT5l) . 

Proposition 1: [[3T1l The update procedure (fT5l) is a gradient decent algorithm on SE{3) for 
the potential function ipisiioigioi), where the function ^ is defined in ^. 

Let us now view "^{gi^gioi) = i^isiwo 9woi) as the local objective function to be minimized 
by vision camera i. Then, we see that the group objective ([5]) is given by the sum of the local 
objective functions for all i eV. Note that each vision camera does not know the local objective 
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of the other vision cameras. Under such a situation computing a solution minimizing the global 
objective function by using local negotiations is called multi-agent optimization problem and |[32l 
presents an update rule of the local estimates of the solution to produce approximate solutions to 
the global objective combining the gradient decent algorithm of the local objective function and 
the consensus protocol [ fTOl |. The present cooperative estimation mechanism is inspired by the 
algorithm but the consensus protocol cannot be executed on SE{3). We thus instead use a pose 
synchronization law presented in ifTSl . which is also based on passivity of rigid body motion. 

We next present an update rule of the estimates gio^ so as to estimate the average g*. Each 
vision camera i first gains the estimates gjo- from j E Mi as messages. Now, by multiplying 
known information gij from left, each vision camera i gets gio := gijgjo for all j E Mi. Using 
the information, the estimate gio^ is updated according to ^ with 

Uei = keCei + K^^ Eii{g^^]gioj) , ke > 0, ks> 0. (16) 

Since Cei is reconstructed from the visual measurement fi by (fTT|) and gio is obtained through 
communication as stated above, the update procedure (fT6l) is implementable. 

The present input (fT6l) consists of the visual feedback term k^Cet and the mutual feedback 
term ks^-^j^ER^gl^gio-), where the former is inspired by the visual motion observer |[8l 
and the latter is by the pose synchronization law [15]. Indeed, without the second term, the 
update rule (fT6l) is the same as that of the visual motion observer (fT5l) . In addition, without the 



July 28, 2011 



DRAFT 



12 



visual feedback, the update procedure (fT6l) . namely u^i = ks^,^j^_Eji{g^J^gio^), is essentially 
equivalent to the passivity-based pose synchronization law [[TSl of a group of rigid bodies with 
states (jyjo- := gwidiof Thus, under appropriate assumptions, each state (7^0^ would converge to a 
state satisfying g^Oi = Qwo Vi, j G V as time goes to infinity without the visual feedback term. 
In other words, the visual motion observers are networked by the mutual feedback term in 
the total estimation mechanism formulated as 

Qio, = 9io,Uei • ■ ■ (HJ 

ee^ = J!i^^oJfe^ " " " (HU V^ G V, (17) 



Networked VMO: < 



Uei = keCei + K T^j^Af, ER{gio]9io,) " " • dM]) 



where VMO is an acronym for Visual Motion Observer. This is why the estimation mechanism 
is called networked visual motion observer. The block diagram of the total system of vision 
camera i is illustrated in Fig. [8l 

IV. Averaging Performance Analysis 

In this section, we derive ultimate estimation accuracy of the average g* achieved by the 
networked visual motion observer (flTl) under the following assumption. 

Assumption 2: 
(i) The target objects are static, i.e. V^^. = Vi G V. 

(ii) There exists a pair (i, j) G V x V such that p^Oi 7^ Pwoj and e^^™"^ ^ e^^'^°i . 
(iii) e-«^*e«^-. > for all i G V. U 

The moving target objects will be investigated in Section |Vl The item (ii) is assumed in order 
to avoid a meaningless problem such that g^Oi = dwo Vz, j G V. Indeed, under the situation, it 
is straightforward to prove convergence of the estimates to the common pose. In terms of the 
item (iii), we see that if e^^^'"°^e^^'"°^ > for all i,j G V, then the following inequality holds. 

0(e-?X*e«^-.) < 0„ := max0(e-«'^'"°'e^"^™°O Vi G V (18) 

Inequality ^T^ implies that if e"^^'""- e^^"'°j > Vi, j G V (0™ is smaller than 2), then (iii) is 
satisfied. Thus, (iii) can be checked if set-valued prior information on the target orientations, i.e. 
an upper bound of 0m, is available. 

'Throughout this paper, we refer to a real matrix M, which is not necessarily symmetric, as a positive definite (positive 
semi-definite) matrix if and only if x^ Mx > {x'^ Mx > 0) for all nonzero vector x. 
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A. Definition of Averaging Performance 

In this subsection, we introduce a notion of approximate averaging. For this purpose, we define 
the following sets for any positive parameter e. 

7j 5^ WVio. - P*\? <^Pp\ , Pp — 9 5^ \\Pio. - Pi\ 



QJe 



pV^) ■- \ yPioJiev 



,*l|2 



n / J I u '"I Jill — - r y \ 1 r y - ry 



^Rie) := I iS'-^ 



Hev 



(19) 

5^0(e-«'^*e«^-) <epR\, pn ■.= Y,<\>{e~^'' S'-^) (20) 



iev ) iev 

Let us now define e-level averaging performance to be met by the estimates gio^ = {pio^ , e^^'°' ) . 

Definition 1: Given target poses {giojiev^ position estimates (piojiev ^e said to achieve e- 
level averaging performance for a scalar £ > if there exists a finite T such that {pioii't))iev ^ 
^p{e) Vt > T and the orientation estimates (e^^'°i)igv are said to achieve e-level averaging 
performance if there exists a finite T such that (e^^'°»(t))jgv G ^ni^) Vt > T. 

In the absence of communication, each vision camera i acquires no information on the target 
objects Oj, j 7^ i. Under the situation, what each vision camera can do is to produce as an 
accurate estimate of the relative pose gio^ as possible. Namely, the parameters pp and pr specify 
the best performance of average estimation in the absence of communication. More specifically, 
since the visual motion observer (fT4l) correctly estimates the static target object pose gio^ (Fact 
1), the parameters pp and pr indicate the average estimation accuracy in the absence of the 
mutual feedback term of Uei in (fT6l) . Namely, the parameter e is an indicator of improvement of 
average estimation accuracy by inserting the mutual feedback term ks ^j^j\f. ^RilJioldioj)- 

B. Auxiliary Results 

In this subsection, we give some results necessary for proving the main result of this section. 

Lemma 1: Suppose that the estimates (giojiev ^e updated by the networked visual motion 
observer (flTI) . Then, under Assumptions \T\ and |2] and e~^^'°ie^^i > Vt > 0, for all c > 0, 
there exists a finite t{c) such that 0(e~^''»*ef^'°») < 0(e~^^*e^^'"°h) + c Vt > t{c), i eV, where 
h := argmaXjgv0(e~''^*e^ ""j). 

Proof: See Appendix |Al ■ 
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Lemma \\\ implies that the individual estimate e^^'°i gets closer to the average e^^»* at least than 
the object with the farthest orientation from the average. Li addition, the proof of this lemma 
also means that the set 

is positively invariant for the total system (flTI) under Assumption 2. Namely, if e~^^*°ie^^»* > 
is satisfied at the initial time, then e^^^^°^e^^i > holds for all subsequent time. 

We next have the following lemma. 

Lemma 2: Suppose that the estimates {giojiev ^e updated by the networked visual motion 
observer (flTl) . Then, under Assumptions [T] and [21 if the initial estimates satisfy (e^^'°' (0))jgv ^ <S, 
both of the estimates {piojiev and (e^^'°«)jgv achieve 1-level averaging performance. 

Proof: See Appendix |B] ■ 

This lemma is proved by using the energy functions 

which are defined by the sum of individual error between the average and the estimate. The 
functions Up > and Ur > are equal to if and only if Pjq. = p* and e^^'"* = e^^»* Vz G V 
respectively. The selection of the energy function is inspired by one of our previous works on 
pose synchronization [I15II whose framework is originally presented in [1131 . 

Lemma [2] means that the average estimation as a group in the presence of communication is 
at least more accurate than the case in the absence of communication. However, this lemma does 
not say how accurate estimates of the average the networked visual motion observer produces. 

From Lemmas [H and |2l the estimates (piojiev and (e^^'°')jev settle into f2p(l) and S^ := 
SnVlji^l) in finite time, respectively. Let us now define the following subsets of fip(l) and iSf . 

iGV jgM 

Sl{k,e) := fi,(l) \ (Slik) U np{e)), Siik,e) := 5f \ (S^ik) U fi^(e)) 



for some e E [0,1), where /3 := 1 — W2(0(e~^^*e^^'"°ft) + c) and k = k^/kg. Images of the 
subsets on the position space are depicted in Fig. |9l We see from the figure that 

fip(l) \ (5f(A;) U5f(fc,£)) C np{E), 5f \ {S^ik)US^{k,e)) C ^^(e). (21) 
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In terms of the subsets iSfl^) ^^'^ S2{k), we have the following lemma. 

Lemma 3: Suppose that all the assumptions in Lemma |2] hold and (3 > 0. Then, the time 
derivative of Up and Ur along with the trajectories of (flTI) are strictly negative as long as 
(Piojiev ^ '52(/i;) and (e^^'°i)igv ^ 'S2^ik) respectively, at least after the time t{c). 

Proof: See Appendix O ■ 

From (fTSi) . /3 can be estimated by set-valued prior information on the target orientations i.e. 0^. 



C. Averaging Performance 

We are now ready to state the main result of this section on averaging accuracy attained by 
the networked visual motion observer (fTTI) . 

Theorem 1: Suppose that all the assumptions in Lemma [2l hold. Then, for any e G (0, 1), 
position estimates (piojiev achieve ^p-level averaging performance with 

1 - (1 - e) (l - y/kw) if A; < 1/W 
1 Otherwise 

and orientation estimates (e^^'°i)jgv achieve e^j-level averaging performance with 

1 - (1 - e) (y^ - Vkw) , if fc < (3/W, /3 > 



(22) 



sr 



(23) 



otherwise 



where W is defined in (H)). 

Proof: See Appendix iDl ■ 

Suppose that e is taken sufficiently close to 0. Then, we see that both of the parameters £p and 

Er become small as the term y/kW approaches to 0. Note that if we use a sufficiently small k 

(kg ^ ke) in (fT6l) . the term is approximated by 0. Here, we see an essential difference between 
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the position and orientation estimates. The definition of Sp with e ~ 1 indicates that we can get 
arbitrarily accurate estimation of the average p* by choosing a sufficiently small k. In contrast, 
we see from the definition oi cr that an offset associated with v^(< 1) occurs for the orientation 



estimates regardless of the parameter k. From the definition of /3 := 1 — y 2(0(e~5^*e^^"'°'>) + c), 
if the target object's orientation e^^™°h is sufficiently close to the average e~^^*, i.e. if e^^™"^ and 
g^fiu.o^ are close among alH,j G V enough to approximate all the orientations by matrices on a 
tangent vector space of 50(3) at e^^™°s then it becomes close to and the average is accurately 
estimated by the networked visual motion observer (flTl) . Otherwise, the accuracy might degrade, 
though it is more accurate at least than the case in the absence of communication. 

V. Tracking Performance Analysis 

In this section, we analyze the tracking performance of the estimates {(^ioijiev to the average 
g* for moving targets when the networked visual motion observer is applied to the visual sensor 
networks under the following assumption. 

Assumption 3: 
(i) The target body velocities V^^.{^t)^ i eV are continuous in t and bounded as 

11^(^)112 < K^ WojLM' <wl^^eV,t>0. (24) 

(ii) For all t > 0, there exists {i{t),j{t)) G Vx V such thatp^o,,,, ^ Pwo^^.^ and e^'^"°'(*) ^ e^"^™°^-w. 
(iii) e"^"^'"°^ (1)6^^^°^ (t) > for alH, j G V and t > 0. 



A. Description of Average Motion 

In this subsection, we first formulate the motion of the average g* = {p*, e^^*). The behavior 
of the position average p* is clearly described by 



Ij:^''-<o) (25) 



p* = e^^'v''*, v"'* := e-«^* 

■ n 

from the definition of p* = y^J^ievP^of Meanwhile, the trajectory of the orientation average 
e^^* described by ([8]) satisfies the following lemma. 

Lemma 4: Under Assumption [3l the average e^^* is continuously differentiable. 

Proof: From the polar decomposition, we get S{t) = e^^* {t)Ps{t) [25], where S{t) = 
i EiGV 6^"^""°' and Pl{t) = S^{t)S{t). Under Assumption diii), we have e'^^^"^ (t)5(t) > and 
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hence Ps{t) is invertible for all t > 0. Thus, the average e^^* is given by e^^*(t) = S{t)Pg^{t). 
From ©, the matrices S{t) and Ps{t) are clearly differentiable from their definitions and hence 
e^^* is well defined. Moreover, from Assumption [3];i), both of S{t) and Psit) are continuous 
and -ji,(Pg^) = Pg^(t)PsPg^ is also continuous, which implies that e^^*(t) is also continuous. 
Hence, the average e^^* is continuously differentiable. This completes the proof. ■ 

Moreover, since e^^*(t) G 50(3) holds for all t > 0, the derivative e^^* has to satisfy e^^* G 
T je.50(3), where T^^g,S0{3) := {e«"^*X| X G so{3)} is the tangent space of the manifold 
SO (3) at e^^*. Namely, the trajectory of the Euclidean mean e^^* is described by the differential 
equation e^^* = e^^'w^'* with some body velocity a)*'* G so{3). 

We next clarify a relation between velocities V'''* := {v^'*,u'''*) and V^^^^ = iv^o.^^toj, ^ ^ ">^- 
We first define Wp := {v'^ojiev and w/j := (w^^JiGV- Since it is easy from (l25l) to obtain 
||ti^'*|p < llwplp/n, we mention only a relation between cu^'* and iwr in the following. 

Lemma 5: Suppose that the target orientations (e^^"'°i)jgv satisfy 



«>(t)-5(t) 



< 7 Vt > (26) 

F 



for some 7 > 0. Then, the following inequality holds. 



A*mi|2 ^ /^^W|L„ /^M|2 ,,^,A ._ V^ 



|^M(i)||i < ^^wZ||^^(t)||2, ^(^) := ^ (27) 

n V2 -7 



Proof: See Appendix |E] 
Though we omit the proof. 



e^'{t)-S{t) 



is also upper bounded by 0^ and hence 7 is 



estimated by prior information on the target orientations. 

B. Tracking Performance 

Let us consider the whole networked system Tjtrack consisting of the relative rigid body motion 
(dl) for alH G V and the networked visual motion observer (fT71) . Here, we regard the collections 
of body velocities of the target objects (K^ojiev. i-e- w = {wp,wr), as the external disturbance 
to T^track and evaluate the error between the estimates {(jiojiev and the average g* in the presence 
of the disturbance w. Namely, we let the error ({5'*}^^(7ioJiev be the output signal of T^track- 

Unlike the static objects case, Pp = | Zliev lb*Oi "~Pi*P ^^^ PR = J2iev 0(e~^^'*e^^'°i) are also 
time-varying. We thus define the parameters 

p'p ■= sup pp(t), p'^ := suppH(t) 

t t 
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assuming pj, < oo and redefine tlie sets fl'p and Q'^ by just replacing pp and p/j in (fT9l ) and ( |20l ) 
by p'p and p'^ij, respectively. The parameters p^ and p^ are the suprimum of the distance from 
the estimate to the average when gio^ is correctly estimated and hence they are also indicators 
of the best average estimation performance in the absence of communication. Note however that 
the visual motion observer (fT4l) cannot correctly estimate gio^ as long as the object is moving 
with unknown velocity. 

The problem to be considered here is redefined as follows. 

Definition 2: The position estimates (piojiev arc said to achieve e-level tracking performance 
for a positive scalar e if there exists a finite T such that (pioXt))iev ^ ^p(^) Vu' G W and t > T, 
where W is the set of the disturbance signal w{-) satisfying Assumption^ Similarly, the estimates 
(e^^'°i)igv ^e said to achieve e-level tracking performance if there exists a finite T such that 
(e«^'-<t)),ev e ^Rie) Vw G W and t > T. 

In terms of the tracking performance defined above, we have the following theorem. 

Theorem 2: Suppose that the estimates gio- are updated according to (flTI) . Under Assumptions 
1 and 3, if ke > p^(7) for 7 satisfying (|26l) and {e^^'°i (t))igv G 5 Vt > 0, the estimates {piojiev 
and (e^^^°i)jgv achieve e'p and e^-level tracking performances respectively with 



Proof: See Appendix IB ■ 

This theorem implies that the networked visual motion observer works even for moving target 
objects. We also see that the ultimate error between the estimates and the average gets small as 
the visual feedback gain fcg becomes large, which is a natural conclusion from the form of (fT6l) . 
In summary, we have the following conclusion on the gain selection. In order to achieve a 
good averaging performance, we should make the mutual feedback gain kg large relative to the 
visual feedback gain k^.. In order to achieve a good tracking performance, the visual feedback 
gain ke should be absolutely large. Namely, the best selection is to make both gains k^ and 
ks large while the mutual feedback gain kg is much larger than the visual feedback gain kg. 
However, the size of kg is in general restricted by the communication rate due to limitation 
in standard feedback control theory. Then, a trade-off occurs between averaging and tracking 
performances, i.e. if we set a large ke, a good tracking performance is achieved at the cost of a 
poor averaging performance and vice versa. 
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Fig. 10. Overview 



Fig. 11. Feature Points 



Fig. 12. Communication Graph 



VI. Simulation 

We finally demonstrate the effectiveness of the networked visual motion observer and validity 
of the theoretical results through simulation. Throughout this section, we consider the situation 
where five pin-hole type vision cameras with focal length 0.01[m] see a group of target objects. 
We identify the frame of camera 1 with the world frame and let p^2 = [1 0]"^, p^s = 
[0 1 0]^, p»4 = [0 - 1 0]^, p^5 = [-1 0]^, and e«~^»» = h, Vz G {2,3,4,5}. The overview 
of the setting is illustrated in Fig. \Wi where blue boxes represent the initial configuration of 
target objects with p^^, = [0.12 0.55 -2.78], p^o2 = [0.22 0.48 -2.85], p^o-, = [0.33 0.33 - 
2.97], p^o, = [0.42 0.23 -3.08], p^o, = [0.56 0.12 -3.15] and ^O^o^ = [-0.30 -0.30 - 
0.30], ^9,^02 = [-0.30 -0.40 -0.40], ^^^^3 = [-0.40 -0.30 -0.30], ^9^o, = [-0.30 - 
0.40 - 0.30], ^^^05 = [-0.30 - 0.30 - 0.40]. All the targets have four feature points whose 
positions relative to the object frame are illustrated in Fig. \TT\ We use the points projected onto 
the image plane as visual measurements /». The communication structure is depicted in Fig. [T2] 
with W = 1. 

In the first scenario, we consider static target objects and demonstrate validity of Theorem [T] 
Then, the average g* = {p% e^^') is given by p* = [0.33 0.36 - 2.96], ^6* = [-0.32 - 0.34 - 
0.34]. For the configuration of the target objects, the parameter (3 is given by about (3 = 0.86. 
Throughout this section, we let the initial estimates be Pio,(0) = [0 2.5]^ and e^^'°»(0) = I3. 

We first employ the gains ke = I and kg = 0.1 (k = 10). Then, the parameters Ep and Er 
in Theorem \T\ are given by Sp = Sfj = 1. Fig. \T3\ illustrates the time responses of orientation 
estimates of all vision cameras produced by the networked visual motion observer, where the red 
dash-dotted lines represent each element of the average ^*sin6'*. We see from the figures that 
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Fig. 13. Time Responses of Eacli Element of ^ sin 9, i — 1,- ■ ■ ,5 (Static; ks = 0.1) 
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Fig. 15. Time Response of Up (Left) and Ur (Right) (Static: ks = 100) 



there exist gaps between the average and the estimates for all elements. The error functions Up 
and Uji are depicted by blue curves in Fig. [141 respectively, where red dash dotted lines represent 
6pPp and ErPr. Namely, Theorem [T] implies that the blue curve eventually takes lower values 
than the value indicated by the dash dotted line and we see that it is really achieved as expected. 
We next let k^ = 1 and k^ = 100 (k = 0.01). Then, we have Ep = 0.19, sr = 0.31 for 
sufficiently small e and c. Fig. [T5] illustrates the time responses of Up and U^. We see from the 
figures that the estimates of all vision cameras become much closer to the average than the case 
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Fig. 16. Time Responses of Eacli Element of ^ sin 9, i — 1,- ■ ■ ,5 (Moving: ke — 3) 
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Fig. 17. Time Responses of Up (Left) and Ur (Right) (Moving: k^ — 3) 
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Fig. 18. Time Responses of Up (Left) and Ur (Right) (Moving: ke = 30) 



of a small mutual feedback gain kg = 0.1. Fig. [T5]also indicates that the error functions Up and 
Uji ultimately take lower values than the right-hand side of (l22l) and (|23l) respectively. Namely, 
it turns out as predicted that a small k = k^/ks results in a good averaging performance. 
In the second scenario, we consider moving target objects with constant body velocities V^^. = 

r -iT 

0.2 0.8 Vi = 1, ■ ■ ■ , 5 and the same initial states as the above static case. For 
the targets, we apply the networked visual motion observer with ke = kg = 3, where we let 
the initial estimates be the same as the above static object case. Then the time responses of 
orientation estimates are depicted in Fig. \T6[ where red dash dotted curves describe the average 



July 28, 2011 



DRAFT 



22 

motion of the target orientations. We see from the figures that the estimates track the moving 
average within bounded errors and the networked observer also works for a dynamic problem. 

The responses of Up and Ur are illustrated in Fig. [l71 where the dash-dotted lines show e'^p' 
and s'j^p'^. As shown in Theorem [21 both of Up and Ur ultimately take values smaller than e' p'^ 
and e'ffp'^ respectively. Their counterparts for k^. = 30, kg = 3 are shown in Fig. [181 which 
also illustrate validity of Theorem [2l We also see that a large k^, achieves a better tracking 
performance than a smaller ke, which supports validity of the analysis at the end of Section [V| 

Experimental verifications on a testbed are omitted in this paper but shown in [[27l . |[28l . 

VII. Conclusions 

This paper has presented a novel cooperative estimation mechanism for visual sensor networks. 
We have considered the situation where multiple smart vision cameras with computation and 
communication capability see a group of target objects. We first have presented an estimation 
mechanism called networked visual motion observer to meet two requirements, averaging and 
tracking. Then, we have derived an upper bound of the ultimate error between the actual average 
and the estimates produced by the present methodology. Moreover, we have derived an upper 
bound of the ultimate error from the estimates to the average when the target objects are moving. 
Finally, the effectiveness of the present mechanism has been demonstrated through simulation. 

The authors would like to express sincere appreciation to Prof. Francesco BuUo and Prof. 
Kenji Hirata for their invaluable suggestions and advices. 

Appendix A 
Proof of Lemma [H 

In the proof, we use the following lemma. 

Lemma 6: [I15II For any matrices Ri, R2, R3 E 5*0(3), the inequality 

^tr(i?f i?2 - RjRsRlRs) > (l>iRlR3) - <P{RlR2) + A„,„(sym(i?fi?3))0(i?^i?2) 

holds, where sym(M) := i(M + M^) and Amj„(M) is the minimal eigenvalue of matrix M. 
The time evolution of the orientation estimate e^^'"^ in © with V^^ = and (fT6l) is given by 

gle.., _ Je.o, ^^^.^ ^^^ . _ keenie-^'-^ J'-' ) + ^s $^ eR{e~^''°^ e^'-^ ) , (28) 
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which is independent of evolution of the position estimate Pio^. Multiplying e^^""* to (|28] ) from 
left, we have the following equation describing evolution of the estimate e^^"""* relative to S^. 

Let us now consider the energy function 

Then, the time derivative of U along with the trajectories of (|29l ) is given by 

f/ = 2e5(e-«^*e«^"--)^ne. = -tr (sk(e-«^*e«^--)^«e.) , (30) 

where we use the relation a^h = — |tr(a6). Substituting (|29l) into (l30l) yields 

+A;,( 5^ e-«^*e«^"-- - e-«^*e«'^"--e-«^"™°.e«^-')). (31) 

From Lemma [6l (|3T1) is rewritten as f/ < — (/cg-^i + ksF2), where 

From the definition of the index l, the inequality 0(e^'^^*e^^"'°') > 0(e"'^^'e^^'"°j) Vj G V holds 
and hence we obtain F2 > crXlieM 0(6^^™°' e^ "'°0- Thus, the inequality 

is true. From the assumption of e^^^*e^^'"°i > Vi G V, we have a > and the inequality 

holds. Thus, if 0(e"^^*e^^"'°') - 0(e~'^^*e^^"'°'i) > c, then f/ < -eke is true. Namely, there exists 
a finite r(c) such that e^^'""' satisfies 0(e~^^*ef^'"°') — 0(e~^^*e^^"'°h) < c\/t > t{c) and, from 
the definition of i, we also have 0(e"^^*e^^"'°') — 0(e~^^*e^^™°h) < c Vt > r(c) for all ? G V. 
This completes the proof. 
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Appendix B 
Proof of Lemma [2] 

In the proof, we use the energy functions 

We first consider evolution of the position estimates {piojiev and then show its counterpart with 
respect to orientation estimates (e^^'°i)igv separately. The time evolution of the position estimate 
Pio, in & with V^i = and ^ is described by pio^ = k^ipio, - PioJ + ks EyeMfeo, - PioJ- 
Since the cameras are static, the evolution of p^Oi relative to the world frame S^, is given by 

Pwoi 6 Pioi '^eyPwOi Pwoi) ' i^s / ^ \PwOj Pwoiji W-^/ 

jeA/'i 



which is independent of evolution of the orientation estimates (1291 ). 

If we define qi := p.^o, — P* and g^ := p^o. — p*, the time derivative of Up along with the 
trajectories of (|32l) is given by 

^p = X^ I ^eqj{qi - qi) + ^s^ qfiqj - qi) J 
iev \ jeAfi J 

- -\^ (k (WnM^ - WnM^ - \\n' - n-P) + k \^(\\n-\\^ - WnM'^ ~ \\n- - ^■P^^ 

— r, _^ I '*'eUiy«ii iiy«ii iiy« q^\\ ) ^ ^s 2_^\\\qj\\ iiy«ii iivj y«ii ))■ 

iev j&Aft 

Since Xliev ^j£^f^ ll^iP ~ llyjll^ = holds under Assumption [T] [fTSl . we obtain 

g.ir)- (33) 







iGV 


' /ll 1 2 II - 

^edlgd - Mi 


1 ^ 


k*- 


y.f) 


-A;. 


^ U II ^i 


We 


see 


from (l33l) that if (pjo 


J,6V G fip(l) 


then 














f/p<- 


2> ,(^^1^^- 


-9*1' 


^ + /t. 


^i.1 


^i- 


- g. ^) 



(34) 

holds. From Assumption [21 J2iev 11^* ~ ^«ll^ ^'^^ XlievSjeA/' H^i ~ ^«ll^ ^^^ never equal to 
simultaneously and hence the right-hand side of (l34l) is strictly negative. Thus, the trajectories 
of the position estimates {piojiev along with (|32l) settle into the set Vlp(l) in finite time. 
The time derivative of Un along the trajectories of (|29l ) is given by 

f/K = 2 5^ elie-^'^J'-"^ )uue^, = - 5^ tr (sk(e-«^*e«^"-°^ )c:;„e.) • (35) 

iev iev 



July 28, 2011 DRAFT 



25 



Substituting ^ into ([35]) yields 

Ur = -Y,^riK^i + h^2), (36) 



J6V 



We first consider the term J^i^^i^^) in (|36l) . From Lemma |6l the following inequality holds. 

^ tr($2) > 5^ 5^ U{e-^'* e^'-"^ ) - 0(e-«'*e«''-°^ ) + cr,0(e-«''"-°»e«''-°^ ) } , (37) 

where a, := Amm(sym(e~^''*e^^'^°»)). Assumption [U implies that J^jgv ZljeM *i'^('^~'^^*'^^^"'°') ~ 
0(e"^^*e^^"°j) = O and hence dlV]) is rewritten as 

Xl tr ($2) > ^ ^ ai(j){e-^^-''^ J-^-°^ ). (38) 

We next consider the term ke X^iev ^^("^i) ^^ (l36l) . Applying Lemma |6] again to the term yields 

JGV iev 

Substituting dMl) and dlH) into ([361) yields 

Ur < -^ (fce^le-^'^'e^^"-"-) - A;e0(e-«~^*e«"^-°O 

+ai[ke(l){e~^~^^°^S^^°^) + ksY^ 0(e-«'^"'"°>e^^^'™°o))- (40) 

If {e^^^°i)iev ^ ^/?(1) is true, (HO]) is rewritten as 

f/fi < - ^ (Ji ('A;e0(e~^^^""°' e«"^™°« ) + '^^ XI '^(e"^'^""°' e^^^'™°^ )) • (41) 

Note that, from the assumption of e^'^^*e^^™°» > 0, we have ctj > 0. Since both of the terms 
Z^iev (yi(j){e~^^'"°ie^^^"°i ) and Xliev ^* SieM 0(e~^^™°»e^^™°^ ) are never equal to under Assump- 
tion |2l the right-hand side of (|4TI) is strictly negative. This implies that the trajectories of the 
estimates (e^^'°i)jgv converge to the set VLr{1) in finite time. 
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Appendix C 
Proof of Lemma [3] 

Suppose that 0(e~^^*ef^'"°«) < 0(e~'^^*e^^"'°h) + c holds true for some c > 0. Then, from 
Hoff-man-Wielandt's perturbation theorem [[33l . we have 

This immediately means 



A^,„(sym(e-«^*e«^-°0) > /^ := 1 " V2(0(e-«^*e«^-°O + c). (42) 

From Lemma [U and (|42l) . Inequality (|40l) is rewritten as 

iev 

+13 (ke(l)ie-^^-''^ e«"^™°' ) + ksY^ 0(e-«'^'™°- e^^^"™°^ )) ) (43) 

at least after the time t(c). If (e^^^°i)jgv ^ S2{k) holds true, then we have 

f/fi < -K Y^ ('0(e-«^*e«'^""°' ) + /30(e-«'^"-°- e«^-°- )) (44) 

at least after the time t{c). Under Assumption 2, the right-hand side of (|44l) is strictly negative. 
In terms of Up, from ([33]), if feojiev e 5f (A;), we have f/p = -^ ^liev (ll^iP + hi-^if\ 
whose right-hand side is strictly negative under Assumption |2l These complete the proof. 

Appendix D 
Proof of Theorem [H 

We first consider evolution of the position estimates (piojiev described by (|32|) . The case not 
satisfying k < 1/W is already proved in Lemma [2] and hence we consider the case such that 
k < 1/W is satisfied. Lemmas |2] and [3] indicate that Up < holds in Vlp(l) U S2{k). Namely, 
from the inclusion (J2T)) . we have Up < except for the region i^p{6p) if Up < holds in the 
region S^{k,ep). If it is true, the trajectories along with (|32l) settle into the set ^p{6p) in finite 
time. It is thus sufficient to prove that (ip is strictly negative for all (pjojiev ^ S^{k,ep). 
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Equation (|33] ) is rewritten as 

2 



Up = -i^J2(- ll'^^ll' + ll^^ll' + (^ - ^)ll^^ - ^*ll') - "P' '^'^^^ 



iev 



where ap := \ Yliiev \ ^e^Wli ~ QiW^ + ks ^j,^^^ \\qj ~ QiW^j is strictly positive under Assumption 
m Now, for any a E (0, 1) and j* E V, we have 



12 ^ „|l ;, 112 " 



||gi-gi|| > c^Ui-Qj'W - -, hi-Qj'W ■ (46) 

I — a 
Let j* be a node satisfying j* = argminjg -D(io) and G^ = (V, £^) E T{j*) be a graph satisfying 
G^ = argminGrero*) D{Gt), where £> and D are defined in dH). Then, we obtain 



ki-?i*f 



2 



ie{o,- ,dG-^(i)-~i} ze{o,- ,a!Gj W-i} 



where {vQ{i), ■ ■ ■ , Vd^* (i)_i(i)) is the path from root j* to node i along tree G^. Namely, 

XI ll^i - ^i*f ^ X^^tW X ll^^'« ~ ^^i+i«f • (47) 

iGV ieV ;g{o,...,dg. (j)-l} 

holds. For any edge E = {v^, t>^) of G^, the coefficient of ||g^i — qu2 |p in the right hand side of 
(l47l) is given by Xliev ^g* i^'i O'^c* (^)» which is upper-bounded by D{G^) = W. We thus have 

iev £;=(i)i,-u2)g£-* iev jeMr 

The latter inequality of (|48l) holds because G^- is a subgraph of G^. Since (piojiev £ '5f (A;, ^p), 
the inclusion (pjojjgv ^ '52(A;) holds and hence 

X X 11^' " ^ill' = ZlZl 11^-°' -Pwo.f < 2kpp. (49) 

Moreover, the following inequality holds from the definition of the average p*. 

From (|46]), (gS]), 1491) and dSO]), equation ([45]) is rewritten as 



Up<K 



I iev J 

If (piojiev £ <S^ik,ep), then (pjojiev ^ ^p(£^p) and hence (|5TI) is rewritten as 

f/p < fee I ( 1 - e) (a - ^ - ( 1 - v/^l^) ' ) } pp - ap . 
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Under the assumption that k < 1/W, the inequality a — j^ < (l — VkW) holds for any 
a E (0, 1) and hence Up < —a^ < 0. This completes the proof of the former half of the theorem. 

We next consider the evolution of the orientation estimates (e^^'°i)jev described by (|29l) . The 
case not satisfying k < (3 /W or /3 > is already proved in Lemma [2l We thus consider the case 
such that k < [3/W and /3 > hold. We first note that the set S = {(e^"^"°Oiev| e~^^'°^e^^*- > 
Vi G V} is a positively invariant set from Lemma[T]and hence trajectories of (e^^'°Oiev starting 
from S never gets out of S. Lemmas [2l and [3] also prove that, in the region S, Ur < holds 
if (e^^^°»)iev ^ {<S\ Qr{1)) U 5|^(A;) at least after the time t{c). Namely, as long as Ur < is 
true in the region S^{k, er), the inequality Ur < holds except for the region VIr^er) from the 
inclusion (|2T)) . which means the trajectories along with (l29l) settle into the set VlR^eR) in finite 
time. It is thus sufficient to prove that Ur is strictly negative for all (e^^^°i)i^y G S^{k,€R). 

We first notice that if we define aR := P Xliev (kee4>{e~^'^'^°^ e^^^°^)+ks Y^j^m, 0(6"^"^™°" e^"^"""^ ) j 
ttR is strictly positive under Assumption [21 Using the parameter ur, (l43l) is rewritten as 

Ur< -keJ2(- 0(e"^~^*e«"^™°O + 0(e-«^*e«'^"™°O + /3(1 - e)0(e-«^'-°>e«"^'"°o) - «R- (52) 
We thus consider the former three terms of the right hand side of Inequality (|52l) . We first have 

1 — « 
for any a E (0, 1) and j* E V. Again, let j* be a node satisfying j* = argminj^ /^(io) and 
G^ = {V,£^) E T{j*) be a graph satisfying G^ = argmincrgro*) D{Gt)- Then, the inequality 

0(e-«'^"-°.* e«^"-°- ) < dcj {{} Yl 0(e"^'^"°"H') e^^"'°"'+i« ) (54) 

holds from the definition of the energy function (p and hence 

ieV ieV iG{0,---,dG* (i)-l} 

Similarly to the case of position estimates, (l55l) is rewritten as 

J2 (Pie'^^^"^* e«^"™°' ) < ly J2 0(e"^'^""°-^ e^'^"™"-^ ) < ly ^ ^ 0(e-«^"™°' J'^^"^ ) . (56) 

Since (e^^"°i)igv e 5|^(fc,£:ij,), the inclusion (e^^"°i)igv ^ '5|^(^) holds and hence 
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is true. We next focus on 0(e ™°j*e^^'"°i) in (|53l ). From the definition of the average e^^* dV]), 

^ (f){e-^'-''r e«^-°» ) > ^ 0(e-«^* e«^-°- ) = p^ (58) 

iev iev 

holds for any e^^'""^ G 5*0(3). Substituting ^, ^, dST]) and dSS]) into inequality ^ yields 



A;iya 



Un<-h{{Yl <Pie-^'' e^'-^)) - (l - (1 - e) (a/3 - ^))pn} - an- (59) 

If (ef^^°Ojev G '^I^I^jS^r). then (ef^^°Ojev ^ ^(^r) and hence ^5^ is rewritten by 

Ur < K{1 - e) (a/3 - J— - ( v^ - 7^1^] jp^ - an. (60) 

Let us now notice that, under k < (5/W, a(5 - ^^^ < [v^- VA;W^] holds true for any 
a G (0, 1) and hence Ur < —ur < 0. This completes the proof of the latter half of the theorem. 

Appendix E 
Proof of Lemma [5] 

Suppose that S{t) moves from S{t) = S to Sit + t^) = S + AS. We also describe e^^*{t + tA) 
as e^'^'it + tA) = e«^*(t) + Ae«^*, Ae?'^* := Proj(5 + AS) - e^'^'it). Then, if \\AS\\l < s is 
true for some s, it is proved in [i34il that under (|26|) 

sup II Ae^^ll < b := 4(1 - (1 - ^2(^)^/2)1/2) < ^2^^^- ^g^^ 

The hypothesis of II Ae^^''* III, > ^^2(7) || AS |||, contradicts dM]) and hence HAe^^'^'Hl, < /i2(7)||AS'||| 
From continuous differentiability of the average e^^* , we also get 

\\co''*f = ||e«^l^ = II lim Aj''/tA\\l = lim ||Ae«^7tA||?. 

tA->0 iA-s-0 

< lim /i2(7)||A5/tA||| = p'(7)ll lim A5/tA||^ = /i2(7)||^|||. 

iA->0 tA-s>0 

It is clear that n||S'(t)|||, < ||w(t)|p holds and hence ^ is true. 

Appendix F 
Proof of Theorem [2] 

We first consider the statement in terms of the position estimates. The time derivative of Up 
along the trajectories of the system T^track is given by 

f>P = E^t(e^^'-°-t'.e-e«^*t;'''*). (62) 

iev 
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From Lemma [2l we obtain 

J^gf e«^^-^;„e < I Edl^'ll' - ll^^^in ^ ^^P'v - y E ll^^^ll' (63) 

under Assumptions [T] and [3l In addition, under Assumption [3l the second term of (|62l ) satisfies 

-L X ^ / 11 ca* f. j-iio 11 iio 11 ca* ^-noX . -L 



igv iev iev 

Substituting ^^ and dill) into ([621) yields 



f7, < 5^ A:,p; - (fee - 1) (^ 5^ llg.ir) + ^ 



P 



«^'. (65) 



Now, we see from (|65l) and the definition of e^ that f/p < as long as (pjojjgy ^ Q!p{e'p). Hence, 
the function Up{t) is monotonically strictly decreasing in the region and there exists a finite time 
T such that feojiev e fi;(4) Vt > T. 

We next consider the evolution of orientation estimates. The time derivative of Ur along the 
trajectories of the system T^track is given by 



Ur = 2Y,el,{e-^''S'-^){uJue^ - ^'n- (66) 

iev 

From Lemma [2l we obtain 

iev 
< Kp'n -keJ2 0(e-«''*e^^"-°« ) (67) 



iev iev 



iev 



under the assumption of (e^^'°i(t))jgv G 5 Vt > and Assumptions [T] and [3l We also have 



.g-^e-g^e^o^x fe,* 



-2E^S( 

iev 



|a;^'*||2 



iev 



< E (/^'(7)l|ei^(e-^"^*e^^^-)f) + ^||a;^'*f < $^ [f^\j)\\en{e'^'' J'-^^f) + w^m 



^ev /^'(^) .ev 
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where the last inequality holds from Lemma [51 Since ||eij(e~'^^*e^''"'°i)|p < 0(e^^^*e^^'"°«) is 
true, substituting (|67l) and (|68l) into (|66l) yields 

?7ij < fcep'^ + ^^ - {K - /i'(7)) E </'(e-«''*e«^"»°0- 

Now, if (e^^'°»)jgv ^ ^r(^r) holds, then fZ/j < 0. Hence, the function [/^(t) is monotonically 
strictly decreasing in the region and this completes the proof. 
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